-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!!! TASK: Split dimensionspacepoints into separate table to reduce data duplication #4790
Conversation
038f0df
to
4274928
Compare
2239ab0
to
6bef019
Compare
This should reduce stored data amounts, also make queries and workspace copies faster. The hierarchy relations are now entirely limited size field types and should not need external lookup like "text" fields do.
6bef019
to
3cf14ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (I was just skimming over it on my phone), thanks for taking care!
Apart from the inline comments I wonder:
We assume, that we have a performance issue and that it is solved with this PR. I have no reason to doubt this, but did you measure that by any chance?
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/ContentGraph.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great already, just one more question/comment.
But here is one example outcome on my local system
Impressive, almost 30% speed improvement \o/
@kitsunet thanks for going the extra mile and providing a basic benchmark.
Data size I cannot verify yet
I don't think, that's that important really.. it will most probably be in the same region
Neos.ContentGraph.DoctrineDbalAdapter/src/DoctrineDbalContentGraphProjection.php
Outdated
Show resolved
Hide resolved
Neos.ContentGraph.DoctrineDbalAdapter/src/DoctrineDbalContentGraphSchemaBuilder.php
Show resolved
Hide resolved
@@ -67,8 +67,8 @@ public function hierarchyIntegrityIsProvided(): Result | |||
} | |||
|
|||
$invalidlyHashedHierarchyRelationRecords = $this->client->getConnection()->executeQuery( | |||
'SELECT * FROM ' . $this->tableNamePrefix . '_hierarchyrelation | |||
WHERE dimensionspacepointhash != MD5(dimensionspacepoint)' | |||
'SELECT * FROM ' . $this->tableNamePrefix . '_hierarchyrelation h LEFT JOIN ' . $this->tableNamePrefix . '_dimensionspacepoints dsp ON dsp.hash = h.dimensionspacepointhash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the purpose for this check is. But after it's not fully the same as before. So we just check if the entry for a given hash is present in _dimensionspacepoints
. But before we also checked if the md5-hashed value is equal to the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm, we can't know if the hash is the right one though, either it exists in the DSP table or it's invalid, if it exists in there IMHO we can assume it matches the dimension coordinates. We could add a separate check that reads all DSP entries and compares hash and getting the hash from the coordinates in the same row if we really want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with keeping it as it is. Just wanted to point out this difference.
...tentGraph.DoctrineDbalAdapter/src/Domain/Projection/ProjectionIntegrityViolationDetector.php
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me by reading!
Bastians pr should go first ;) |
Neos.ContentGraph.DoctrineDbalAdapter/src/Domain/Repository/DimensionSpacePoints.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow thank you ^^
Interestingly the speed gain is enormous with my 270 events of the neos demo + some extra stuff
Command | Before (average of 3) | After (average of 3) |
---|---|---|
time ./flow cr:projectionReplayAll --force |
~21s | ~1.5s |
time ./flow cr:projectionReplay contentGraph --force |
~29s | ~5s |
(the reason why there is a difference between all and one is some odd db lock, and also somehow this change avoids running into the lock state for a projectionReplayAll
(previously the db had 95% cpu intensity))
$coordinates = $this->getCoordinatesByHashFromRuntimeCache($hash); | ||
if ($coordinates === null) { | ||
$this->fillRuntimeCacheFromDatabase(); | ||
$coordinates = $this->getCoordinatesByHashFromRuntimeCache($hash); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha this is a bit hacky as we dont pass this repository as singleton that means it cannot share its runtime cache and would otherwise not pick up on new OriginDimensionSpacePoints, but i consider this a very safe runtime cache as nothing can go wrong (despite it being outdated, in which case we refetch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, we could even make this static if we wanted LOL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah its fine thx, i guess asking the db like this is good enough and it will also work for async / parallel
Neos.ContentGraph.DoctrineDbalAdapter/src/DoctrineDbalContentGraphProjectionFactory.php
Outdated
Show resolved
Hide resolved
The reading side now shares one instance and the writing side uses the repository transactional with the respective other entry written (NodeRecord / HierachyRelation) |
$tableNamePrefix, | ||
$nodeAggregateId, | ||
$originDimensionSpacePoint, | ||
$originDimensionSpacePointHash, | ||
$properties, | ||
$nodeTypeName, | ||
$classification, | ||
$timestamps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i love php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement, its a nice bike.
A just a noob question @kitsunet Why don’t we join but handle that separately? I guess performance and simplicity? |
return $table | ||
->addIndex(['childnodeanchor']) | ||
->addIndex(['contentstreamid']) | ||
->addIndex(['parentnodeanchor']) | ||
->addIndex(['contentstreamid', 'childnodeanchor', 'dimensionspacepointhash']) | ||
->setPrimaryKey(['childnodeanchor', 'contentstreamid', 'dimensionspacepointhash', 'parentnodeanchor']) | ||
->addIndex(['contentstreamid', 'dimensionspacepointhash']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol after debugging i found out that this change here slows down the replay of ContentStreamWasForked
events dramatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #5009
This should reduce stored data amounts, also make queries and workspace copies faster.
The hierarchy relations are now entirely limited size field types and should not
need external lookup like "text" fields do.
Note that this needs a
./flow cr:setup
and./flow cr:projectionreplay --projection contentGraph